Conversation
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Final Answer Synthesis module to orchestrate end-to-end generation of structured user responses with support for single-shot and sectional modes, alongside a significant refactor of model metadata handling with caching, normalization, and token limit resolution. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent Flow
participant Synth as Final Answer Synthesis
participant Planner as Section Planner
participant Mapper as Fragment Mapper
participant LLM as Language Model
participant Store as Result/Stream
Agent->>Synth: executeFinalSynthesis(context)
Synth->>Synth: decideSynthesisMode(budget)
alt Mode: Sectional
Synth->>Planner: planSections(fragments, budget)
Planner->>LLM: generate section plan
LLM-->>Planner: section definitions
Planner-->>Synth: SectionPlanResult
Synth->>Mapper: mapFragmentsToSections(fragments, plan)
Mapper->>LLM: assign fragments to sections
LLM-->>Mapper: mappings
Mapper-->>Synth: SectionMappingEnvelope
loop Per Section
Synth->>Synth: selectImagesForSection(budget)
Synth->>LLM: synthesizeSection(section, fragments, images)
LLM-->>Synth: section answer
Synth->>Store: stream section result
end
else Mode: Single
Synth->>Synth: selectImagesForFinalSynthesis(budget)
Synth->>LLM: synthesizeSingleAnswer(fragments, images)
LLM-->>Synth: final answer
Synth->>Store: stream answer chunks
end
Synth-->>Agent: FinalSynthesisExecutionResult
sequenceDiagram
participant App as Application
participant Cache as Model Info Cache
participant API as External Model API
participant Norm as Normalization
participant Store as Normalized Metadata
App->>Cache: fetchNormalizedModelMetadata(forceRefresh)
alt Cache Valid
Cache-->>App: cached metadata
else Cache Invalid
Cache->>API: fetchModelInfoFromAPI()
API-->>Cache: raw model records
Cache->>Norm: normalizeModelInfoRecords(raw)
Norm->>Norm: resolve conflicts, compute limits
Norm-->>Store: NormalizedModelMetadata
Store-->>App: normalized metadata
end
App->>App: getModelTokenLimits(modelId)
App->>App: getEffectiveMaxOutputTokens(requested)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The PR spans multiple interconnected systems (final synthesis orchestration, model metadata normalization, provider integration) with significant logic density. The new final-answer-synthesis module introduces intricate budgeting, fallback handling, and streaming coordination across 1700+ lines. The fetchModels refactor introduces a new caching/normalization layer with conflict detection and token resolution logic. Integration points across message-agents, provider/base, and workflow require cross-file reasoning. However, changes are well-scoped and cohesive rather than scattered. Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the AI agent's ability to generate final answers, particularly when dealing with extensive evidence. By introducing a flexible synthesis pipeline that can dynamically switch between single and sectional processing modes, the system can now efficiently handle large context windows, improve answer quality, and reduce the risk of context length errors. The changes also centralize and streamline the underlying logic, making the system more scalable and easier to maintain. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new, sophisticated final answer synthesis mechanism, primarily implemented in a new file server/api/chat/final-answer-synthesis.ts. This module intelligently decides between a single-shot or a multi-sectional approach for generating answers, especially when dealing with large amounts of context and images, to stay within model token limits. It includes logic for planning sections, mapping fragments to these sections, and synthesizing each section concurrently. To support this, server/ai/modelConfig.ts was updated to store and retrieve maximum input token limits for various models, and server/shared/types.ts was modified to include maxInputTokens in the ModelConfiguration interface. The existing server/api/chat/message-agents.ts file was refactored to delegate the entire final synthesis process to the new module, simplifying its code. Additionally, a new test file server/tests/finalAnswerSynthesis.test.ts was added to validate the new synthesis logic, covering fragment preview generation, sectional payload construction, mode selection, and fragment budget management. A minor indentation issue was noted in the selectMappedEntriesWithinBudget function call within the new synthesis file.
Note: Security Review did not run due to the size of the PR.
| selectMappedEntriesWithinBudget( | ||
| orderedEntries, | ||
| baseTokens, | ||
| safeInputBudget, | ||
| ) |
There was a problem hiding this comment.
There's a minor indentation issue here that affects readability. The arguments to selectMappedEntriesWithinBudget are indented incorrectly.
| selectMappedEntriesWithinBudget( | |
| orderedEntries, | |
| baseTokens, | |
| safeInputBudget, | |
| ) | |
| selectMappedEntriesWithinBudget( | |
| orderedEntries, | |
| baseTokens, | |
| safeInputBudget, | |
| ) |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/api/chat/message-agents.ts (1)
3092-3118:⚠️ Potential issue | 🟠 MajorReset the full final-synthesis state on every non-success exit.
review.lockedByFinalSynthesisis set before the streaming-channel guard and never cleared on the early return or either catch path. After a recoverable synthesis failure, the run will skip all later reviews/replanning, and the leftoverstreamedText/ackReceivedcan leak partial output into a retry or delegated fallback.Proposed fix
async execute(_args, context) { const mutableContext = mutableAgentContext(context) - if (!mutableContext.review.lockedByFinalSynthesis) { - mutableContext.review.lockedByFinalSynthesis = true - mutableContext.review.lockedAtTurn = - mutableContext.turnCount ?? MIN_TURN_NUMBER - loggerWithChild({ email: context.user.email }).info( - { - chatId: context.chat.externalId, - turn: mutableContext.review.lockedAtTurn, - }, - "[MessageAgents][FinalSynthesis] Review lock activated after synthesis tool call." - ) - } + const resetFinalSynthesisState = () => { + mutableContext.finalSynthesis.requested = false + mutableContext.finalSynthesis.completed = false + mutableContext.finalSynthesis.suppressAssistantStreaming = false + mutableContext.finalSynthesis.streamedText = "" + mutableContext.finalSynthesis.ackReceived = false + mutableContext.review.lockedByFinalSynthesis = false + mutableContext.review.lockedAtTurn = null + } + if ( mutableContext.finalSynthesis.requested && mutableContext.finalSynthesis.completed ) { return ToolResponse.error( @@ if (!mutableContext.runtime?.streamAnswerText) { return ToolResponse.error( "EXECUTION_FAILED", "Streaming channel unavailable. Cannot deliver final answer." ) } + + if (!mutableContext.review.lockedByFinalSynthesis) { + mutableContext.review.lockedByFinalSynthesis = true + mutableContext.review.lockedAtTurn = + mutableContext.turnCount ?? MIN_TURN_NUMBER + loggerWithChild({ email: context.user.email }).info( + { + chatId: context.chat.externalId, + turn: mutableContext.review.lockedAtTurn, + }, + "[MessageAgents][FinalSynthesis] Review lock activated after synthesis tool call." + ) + } @@ } catch (error) { + resetFinalSynthesisState() if (isMessageAgentStopError(error)) { - context.finalSynthesis.suppressAssistantStreaming = false - context.finalSynthesis.requested = false - context.finalSynthesis.completed = false throw error } - - context.finalSynthesis.suppressAssistantStreaming = false - context.finalSynthesis.requested = false - context.finalSynthesis.completed = false loggerWithChild({ email: context.user.email }).error( { err: error instanceof Error ? error.message : String(error) }, "Final synthesis tool failed." )Also applies to: 3160-3179
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/api/chat/message-agents.ts` around lines 3092 - 3118, The final-synthesis state (e.g., mutableContext.review.lockedByFinalSynthesis, mutableContext.review.lockedAtTurn and all fields under mutableContext.finalSynthesis such as streamedText and ackReceived) is set before the streaming-channel guard but not cleared on early returns or failure paths; update the control flow so that any non-success exit (the early return when runtime?.streamAnswerText is false, the "Final synthesis already completed" path, and all catch/failure branches around the synthesis call) resets the entire final-synthesis state to its initial/empty values and clears the review lock (unset lockedByFinalSynthesis and lockedAtTurn) to avoid leaking partial output into retries or later stages, ensuring you touch the same symbols (mutableContext.review.lockedByFinalSynthesis, mutableContext.review.lockedAtTurn, mutableContext.finalSynthesis.streamedText, mutableContext.finalSynthesis.ackReceived, and mutableContext.finalSynthesis.requested/completed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@server/api/chat/message-agents.ts`:
- Around line 3092-3118: The final-synthesis state (e.g.,
mutableContext.review.lockedByFinalSynthesis, mutableContext.review.lockedAtTurn
and all fields under mutableContext.finalSynthesis such as streamedText and
ackReceived) is set before the streaming-channel guard but not cleared on early
returns or failure paths; update the control flow so that any non-success exit
(the early return when runtime?.streamAnswerText is false, the "Final synthesis
already completed" path, and all catch/failure branches around the synthesis
call) resets the entire final-synthesis state to its initial/empty values and
clears the review lock (unset lockedByFinalSynthesis and lockedAtTurn) to avoid
leaking partial output into retries or later stages, ensuring you touch the same
symbols (mutableContext.review.lockedByFinalSynthesis,
mutableContext.review.lockedAtTurn, mutableContext.finalSynthesis.streamedText,
mutableContext.finalSynthesis.ackReceived, and
mutableContext.finalSynthesis.requested/completed).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0bc99031-2b58-42ec-bc34-f430af9bf1ad
📒 Files selected for processing (6)
server/ai/modelConfig.tsserver/api/chat/final-answer-synthesis.tsserver/api/chat/message-agents.tsserver/logger/index.tsserver/shared/types.tsserver/tests/finalAnswerSynthesis.test.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
server/ai/provider/base.ts (1)
23-33: Redundant fallback chain formodelId.Line 25 already ensures
actualModelIdfalls back toresolvedModelId(which itself falls back todefaultFastModel), but line 33 adds another|| defaultFastModelfallback. This is unreachable.🧹 Simplify redundant fallback
- const actualModelId = modelConfig?.actualName || resolvedModelId || defaultFastModel + const actualModelId = modelConfig?.actualName || resolvedModelId const requestedMaxTokens = params.max_new_tokens || 1024 * 8 return { maxTokens: getEffectiveMaxOutputTokens(resolvedModelId, requestedMaxTokens) ?? requestedMaxTokens, topP: params.top_p || 0.9, temperature: params.temperature || 0.6, - modelId: actualModelId || defaultFastModel, + modelId: actualModelId,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/ai/provider/base.ts` around lines 23 - 33, The returned modelId has an unreachable fallback: `actualModelId` is already computed as `modelConfig?.actualName || resolvedModelId || defaultFastModel`, so remove the redundant `|| defaultFastModel` in the return object; update the return's `modelId` to just use `actualModelId` (ensuring `resolvedModelId`, `modelConfig`, `actualModelId`, and `defaultFastModel` remain unchanged).server/tests/fetchModels.test.ts (1)
94-111: Consider adding test for missing model metadata entirely.The test covers the case where the model exists but has null token limits. Consider adding a test for when
getModelTokenLimitsis called with a model ID that doesn't exist in the cache at all to verify the warning behavior.🧪 Optional: Add test for completely missing model
test("warns and returns defaults for model not in cache", () => { __modelInfoInternals.setModelInfoCacheForTests([]) const limits = getModelTokenLimits("nonexistent-model") expect(limits).toEqual({ maxInputTokens: 128_000, maxOutputTokens: undefined, }) // Optionally verify warning was logged if you have a way to capture logs })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/tests/fetchModels.test.ts` around lines 94 - 111, Add a test that verifies behavior when the model ID is not present in the cache: call __modelInfoInternals.setModelInfoCacheForTests([]) then call getModelTokenLimits with a nonexistent model ID and assert it returns the generic defaults (maxInputTokens 128_000 and maxOutputTokens undefined); also optionally call getEffectiveMaxOutputTokens with a sample requested output (e.g., 1500) to assert it preserves the requested output and, if feasible, assert a warning was emitted.server/ai/fetchModels.ts (3)
456-468: Silent clamping ofrequestedMaxTokenswithout caller notification.
getEffectiveMaxOutputTokenssilently clamps the requested value tomaxOutputTokensusingMath.min(). Callers passing values larger than the model's upstream maximum (e.g., 8192 when max is 4096) won't know their request was reduced.Consider logging when clamping occurs or returning additional metadata.
📝 Optional: Add debug logging when clamping
export const getEffectiveMaxOutputTokens = ( modelId: Models | string | null | undefined, requestedMaxTokens?: number, ): number | undefined => { if (requestedMaxTokens === undefined) { return undefined } const { maxOutputTokens } = getModelTokenLimits(modelId) - return maxOutputTokens !== undefined - ? Math.min(requestedMaxTokens, maxOutputTokens) - : requestedMaxTokens + if (maxOutputTokens !== undefined && requestedMaxTokens > maxOutputTokens) { + Logger.debug( + { modelId, requestedMaxTokens, maxOutputTokens }, + "[ModelInfo] Clamped requested max tokens to model limit." + ) + return maxOutputTokens + } + return requestedMaxTokens }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/ai/fetchModels.ts` around lines 456 - 468, getEffectiveMaxOutputTokens silently reduces requestedMaxTokens to the model's maxOutputTokens (from getModelTokenLimits) which can surprise callers; update getEffectiveMaxOutputTokens to notify when clamping occurs by either (a) emitting a debug/process logger message that includes the original requestedMaxTokens, the modelId, and the applied maxOutputTokens when Math.min reduces the value, or (b) change the function signature to return metadata (e.g., { effectiveMaxTokens, wasClamped, allowedMax }) so callers can react; locate the logic in getEffectiveMaxOutputTokens and the getModelTokenLimits call and implement one of these notification approaches consistently across usages.
72-74: Module-level mutable state without synchronization.The
modelInfoCacheandwarnedMissingTokenLimitKeysare module-level mutable variables. While JavaScript/TypeScript is single-threaded, concurrent async operations (e.g., multiple simultaneousfetchModelInfoFromAPIcalls) could lead to race conditions where one call overwrites the cache while another is reading it.Consider using a mutex pattern or ensuring only one fetch is in-flight at a time.
♻️ Proposed pattern to deduplicate in-flight requests
let modelInfoCache: ModelInfoCache | null = null +let inFlightFetch: Promise<RawModelInfoRecord[]> | null = null const warnedMissingTokenLimitKeys = new Set<string>()Then in
fetchModelInfoFromAPI:export async function fetchModelInfoFromAPI( forceRefresh = false, ): Promise<RawModelInfoRecord[]> { if (!forceRefresh && modelInfoCache) { const age = Date.now() - modelInfoCache.timestamp if (age < CACHE_TTL_MS) { return modelInfoCache.rawData } } + if (inFlightFetch) { + return inFlightFetch + } + + inFlightFetch = (async () => { + try { // ... existing fetch logic ... + } finally { + inFlightFetch = null + } + })() + + return inFlightFetch🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/ai/fetchModels.ts` around lines 72 - 74, modelInfoCache and warnedMissingTokenLimitKeys are module-level mutable state that can race when multiple fetchModelInfoFromAPI calls run concurrently; change the implementation to deduplicate and synchronize in-flight fetches by storing an in-flight Promise (e.g., inFlightModelInfoPromise) keyed by model id or a single mutex-like Promise, so fetchModelInfoFromAPI checks for an existing in-flight Promise and awaits it instead of starting a new network call, only updating modelInfoCache and warnedMissingTokenLimitKeys once the Promise resolves; ensure any rejection clears the in-flight entry so subsequent calls can retry and keep references to the unique symbols modelInfoCache, warnedMissingTokenLimitKeys, and fetchModelInfoFromAPI when locating the code to modify.
733-745: Test internals exported unconditionally.The
__modelInfoInternalsobject exposes internal functions and cache mutation helpers in production builds. While useful for testing, this could allow unintended cache manipulation.🧪 Optional: Guard test exports with environment check
-export const __modelInfoInternals = { +export const __modelInfoInternals = process.env.NODE_ENV === "test" ? { getCachedNormalizedModelMetadata, normalizeModelInfoRecords, resolveModelMetadata, resetModelInfoCacheForTests: () => { modelInfoCache = null warnedMissingTokenLimitKeys.clear() }, setModelInfoCacheForTests: (records: RawModelInfoRecord[]) => { warnedMissingTokenLimitKeys.clear() updateModelInfoCache(records) }, -} +} : {} as typeof __modelInfoInternals🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/ai/fetchModels.ts` around lines 733 - 745, The __modelInfoInternals export currently exposes internal helpers (getCachedNormalizedModelMetadata, normalizeModelInfoRecords, resolveModelMetadata) and test-only cache mutators (resetModelInfoCacheForTests, setModelInfoCacheForTests) unconditionally; restrict this by only assigning/exporting __modelInfoInternals when running in a test environment (e.g., check process.env.NODE_ENV === 'test' or an isTestEnv flag) so production builds do not expose modelInfoCache or warnedMissingTokenLimitKeys mutation or the updateModelInfoCache helper; keep the same object shape for tests but ensure the conditional guards the export assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/ai/fetchModels.ts`:
- Around line 338-343: The current fallback path returns modelInfoCache.rawData
when config.LiteLLMApiKey is falsy without re-checking freshness; update the
logic in fetchModels.ts so that when config.LiteLLMApiKey is missing you still
compute age = Date.now() - modelInfoCache.timestamp and only return
modelInfoCache.rawData if age < CACHE_TTL_MS; if the cache is stale, do not
return stale data and instead log a clear warning (including timestamp/age and
that the API key is missing) before proceeding to the error/empty response
path—apply this change where forceRefresh, modelInfoCache, CACHE_TTL_MS and
config.LiteLLMApiKey are used and replace the unconditional return of
modelInfoCache.rawData with the freshness-checked behavior and logging.
In `@server/api/chat/final-answer-synthesis.ts`:
- Around line 281-297: The safeInputBudget calculation in
estimateSafeInputBudget currently forces a minimum of 1024 which can exceed the
actual remaining prompt capacity; replace the Math.max(1_024, maxInputTokens -
reservedOutputTokens) logic so safeInputBudget is clamped to the actual
remaining budget (maxInputTokens - reservedOutputTokens) and not
inflated—compute remaining = maxInputTokens - reservedOutputTokens, then set
safeInputBudget = Math.max(1, remaining) (or Math.max(0, remaining) if you
prefer allowing zero) to ensure it never exceeds the real remaining capacity.
- Around line 757-765: The single-section fallback created by
buildDefaultSectionPlan currently uses section-only instructions that suppress
producing a full final answer; change buildDefaultSectionPlan’s returned section
objective/title to explicitly instruct producing the complete final answer when
it’s the only section (e.g., “Produce the full final answer using mapped
evidence”). Also update the prompt/template logic that emits the “Do not write
the full final answer” instruction (the code that composes section-level
instructions where the string "Do not write the full final answer" appears) to
conditionally skip that prohibition when the plan contains exactly one section
so the fallback yields a complete answer.
---
Nitpick comments:
In `@server/ai/fetchModels.ts`:
- Around line 456-468: getEffectiveMaxOutputTokens silently reduces
requestedMaxTokens to the model's maxOutputTokens (from getModelTokenLimits)
which can surprise callers; update getEffectiveMaxOutputTokens to notify when
clamping occurs by either (a) emitting a debug/process logger message that
includes the original requestedMaxTokens, the modelId, and the applied
maxOutputTokens when Math.min reduces the value, or (b) change the function
signature to return metadata (e.g., { effectiveMaxTokens, wasClamped, allowedMax
}) so callers can react; locate the logic in getEffectiveMaxOutputTokens and the
getModelTokenLimits call and implement one of these notification approaches
consistently across usages.
- Around line 72-74: modelInfoCache and warnedMissingTokenLimitKeys are
module-level mutable state that can race when multiple fetchModelInfoFromAPI
calls run concurrently; change the implementation to deduplicate and synchronize
in-flight fetches by storing an in-flight Promise (e.g.,
inFlightModelInfoPromise) keyed by model id or a single mutex-like Promise, so
fetchModelInfoFromAPI checks for an existing in-flight Promise and awaits it
instead of starting a new network call, only updating modelInfoCache and
warnedMissingTokenLimitKeys once the Promise resolves; ensure any rejection
clears the in-flight entry so subsequent calls can retry and keep references to
the unique symbols modelInfoCache, warnedMissingTokenLimitKeys, and
fetchModelInfoFromAPI when locating the code to modify.
- Around line 733-745: The __modelInfoInternals export currently exposes
internal helpers (getCachedNormalizedModelMetadata, normalizeModelInfoRecords,
resolveModelMetadata) and test-only cache mutators (resetModelInfoCacheForTests,
setModelInfoCacheForTests) unconditionally; restrict this by only
assigning/exporting __modelInfoInternals when running in a test environment
(e.g., check process.env.NODE_ENV === 'test' or an isTestEnv flag) so production
builds do not expose modelInfoCache or warnedMissingTokenLimitKeys mutation or
the updateModelInfoCache helper; keep the same object shape for tests but ensure
the conditional guards the export assignment.
In `@server/ai/provider/base.ts`:
- Around line 23-33: The returned modelId has an unreachable fallback:
`actualModelId` is already computed as `modelConfig?.actualName ||
resolvedModelId || defaultFastModel`, so remove the redundant `||
defaultFastModel` in the return object; update the return's `modelId` to just
use `actualModelId` (ensuring `resolvedModelId`, `modelConfig`, `actualModelId`,
and `defaultFastModel` remain unchanged).
In `@server/tests/fetchModels.test.ts`:
- Around line 94-111: Add a test that verifies behavior when the model ID is not
present in the cache: call __modelInfoInternals.setModelInfoCacheForTests([])
then call getModelTokenLimits with a nonexistent model ID and assert it returns
the generic defaults (maxInputTokens 128_000 and maxOutputTokens undefined);
also optionally call getEffectiveMaxOutputTokens with a sample requested output
(e.g., 1500) to assert it preserves the requested output and, if feasible,
assert a warning was emitted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: db99e037-ea90-4766-96fd-cee55546c6fd
📒 Files selected for processing (7)
server/ai/fetchModels.tsserver/ai/provider/base.tsserver/api/chat/final-answer-synthesis.tsserver/api/workflow.tsserver/shared/types.tsserver/tests/fetchModels.test.tsserver/tests/finalAnswerSynthesis.test.ts
| if (!forceRefresh && modelInfoCache) { | ||
| const age = Date.now() - modelInfoCache.timestamp | ||
| if (age < CACHE_TTL_MS) { | ||
| return modelInfoCache.data | ||
| return modelInfoCache.rawData | ||
| } | ||
| } |
There was a problem hiding this comment.
Cache freshness check passes stale data when API key is missing.
When config.LiteLLMApiKey is falsy, the function returns cached raw data (line 347) without checking whether that cache is stale. This is inconsistent with the freshness check at lines 338-343.
If the API key becomes unavailable after initial caching, stale data could be served indefinitely.
🔧 Proposed fix to log staleness when falling back without API key
if (!config.LiteLLMApiKey) {
- Logger.warn("LiteLLM API key not configured, returning empty array")
+ if (modelInfoCache) {
+ const age = Date.now() - modelInfoCache.timestamp
+ Logger.warn(
+ { cacheAgeMs: age },
+ "LiteLLM API key not configured, returning cached data"
+ )
+ } else {
+ Logger.warn("LiteLLM API key not configured, returning empty array")
+ }
return modelInfoCache?.rawData ?? []
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/ai/fetchModels.ts` around lines 338 - 343, The current fallback path
returns modelInfoCache.rawData when config.LiteLLMApiKey is falsy without
re-checking freshness; update the logic in fetchModels.ts so that when
config.LiteLLMApiKey is missing you still compute age = Date.now() -
modelInfoCache.timestamp and only return modelInfoCache.rawData if age <
CACHE_TTL_MS; if the cache is stale, do not return stale data and instead log a
clear warning (including timestamp/age and that the API key is missing) before
proceeding to the error/empty response path—apply this change where
forceRefresh, modelInfoCache, CACHE_TTL_MS and config.LiteLLMApiKey are used and
replace the unconditional return of modelInfoCache.rawData with the
freshness-checked behavior and logging.
| function estimateSafeInputBudget( | ||
| modelId: Models, | ||
| maxOutputTokens?: number, | ||
| ): { maxInputTokens: number; safeInputBudget: number } { | ||
| const { maxInputTokens } = getModelTokenLimits(modelId) | ||
| const effectiveMaxOutputTokens = | ||
| getEffectiveMaxOutputTokens( | ||
| modelId, | ||
| maxOutputTokens ?? FALLBACK_OUTPUT_TOKENS, | ||
| ) ?? | ||
| maxOutputTokens ?? | ||
| FALLBACK_OUTPUT_TOKENS | ||
| const reservedOutputTokens = Math.ceil( | ||
| effectiveMaxOutputTokens * (1 + FINAL_OUTPUT_HEADROOM_RATIO), | ||
| ) | ||
| const safeInputBudget = Math.max(1_024, maxInputTokens - reservedOutputTokens) | ||
| return { maxInputTokens, safeInputBudget } |
There was a problem hiding this comment.
Clamp safeInputBudget to the actual remaining budget.
Line 296 can inflate safeInputBudget above the real remaining capacity. For example, if a model only has a few hundred prompt tokens left after reserving output, this still forces the budget up to 1024, so decideSynthesisMode() can keep a payload that the provider will reject on context length.
Proposed fix
function estimateSafeInputBudget(
modelId: Models,
maxOutputTokens?: number,
): { maxInputTokens: number; safeInputBudget: number } {
const { maxInputTokens } = getModelTokenLimits(modelId)
const effectiveMaxOutputTokens =
getEffectiveMaxOutputTokens(
modelId,
maxOutputTokens ?? FALLBACK_OUTPUT_TOKENS,
) ??
maxOutputTokens ??
FALLBACK_OUTPUT_TOKENS
const reservedOutputTokens = Math.ceil(
effectiveMaxOutputTokens * (1 + FINAL_OUTPUT_HEADROOM_RATIO),
)
- const safeInputBudget = Math.max(1_024, maxInputTokens - reservedOutputTokens)
+ const remainingInputBudget = maxInputTokens - reservedOutputTokens
+ const safeInputBudget = Math.max(1, remainingInputBudget)
return { maxInputTokens, safeInputBudget }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function estimateSafeInputBudget( | |
| modelId: Models, | |
| maxOutputTokens?: number, | |
| ): { maxInputTokens: number; safeInputBudget: number } { | |
| const { maxInputTokens } = getModelTokenLimits(modelId) | |
| const effectiveMaxOutputTokens = | |
| getEffectiveMaxOutputTokens( | |
| modelId, | |
| maxOutputTokens ?? FALLBACK_OUTPUT_TOKENS, | |
| ) ?? | |
| maxOutputTokens ?? | |
| FALLBACK_OUTPUT_TOKENS | |
| const reservedOutputTokens = Math.ceil( | |
| effectiveMaxOutputTokens * (1 + FINAL_OUTPUT_HEADROOM_RATIO), | |
| ) | |
| const safeInputBudget = Math.max(1_024, maxInputTokens - reservedOutputTokens) | |
| return { maxInputTokens, safeInputBudget } | |
| function estimateSafeInputBudget( | |
| modelId: Models, | |
| maxOutputTokens?: number, | |
| ): { maxInputTokens: number; safeInputBudget: number } { | |
| const { maxInputTokens } = getModelTokenLimits(modelId) | |
| const effectiveMaxOutputTokens = | |
| getEffectiveMaxOutputTokens( | |
| modelId, | |
| maxOutputTokens ?? FALLBACK_OUTPUT_TOKENS, | |
| ) ?? | |
| maxOutputTokens ?? | |
| FALLBACK_OUTPUT_TOKENS | |
| const reservedOutputTokens = Math.ceil( | |
| effectiveMaxOutputTokens * (1 + FINAL_OUTPUT_HEADROOM_RATIO), | |
| ) | |
| const remainingInputBudget = maxInputTokens - reservedOutputTokens | |
| const safeInputBudget = Math.max(1, remainingInputBudget) | |
| return { maxInputTokens, safeInputBudget } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/api/chat/final-answer-synthesis.ts` around lines 281 - 297, The
safeInputBudget calculation in estimateSafeInputBudget currently forces a
minimum of 1024 which can exceed the actual remaining prompt capacity; replace
the Math.max(1_024, maxInputTokens - reservedOutputTokens) logic so
safeInputBudget is clamped to the actual remaining budget (maxInputTokens -
reservedOutputTokens) and not inflated—compute remaining = maxInputTokens -
reservedOutputTokens, then set safeInputBudget = Math.max(1, remaining) (or
Math.max(0, remaining) if you prefer allowing zero) to ensure it never exceeds
the real remaining capacity.
| function buildDefaultSectionPlan(): FinalSection[] { | ||
| return [ | ||
| { | ||
| sectionId: 1, | ||
| title: "Answer", | ||
| objective: "Provide the best complete answer using the mapped evidence.", | ||
| }, | ||
| ] | ||
| } |
There was a problem hiding this comment.
Don’t use section-only instructions for the single-section fallback.
Lines 757-765 create a one-section fallback, but Lines 879-899 still tell the model “Do not write the full final answer” and that other sections are being generated in parallel. That makes the recovery path intentionally suppress a complete answer exactly when planning/mapping has already failed.
Proposed fix
function buildSectionAnswerPayload(
context: AgentRunContext,
sections: FinalSection[],
section: FinalSection,
entries: FragmentAssignmentBatch[],
imageFileNames: string[],
): { systemPrompt: string; userMessage: string; imageFileNames: string[] } {
+ const singleSection = sections.length === 1
const sharedContext = buildSharedFinalAnswerContext(context)
const sectionOverview = formatSectionPlanOverview(sections)
const fragmentsText = formatSectionFragments(entries)
const fragmentsSection = fragmentsText
? `Context Fragments For This Section:\n${fragmentsText}`
: "Context Fragments For This Section:\nNone."
const userMessage = [
sharedContext,
- `All Planned Sections (generated in parallel; a final ordered answer will be assembled later):\n${sectionOverview}`,
+ singleSection
+ ? ""
+ : `All Planned Sections (generated in parallel; a final ordered answer will be assembled later):\n${sectionOverview}`,
`Assigned Section:\n${section.sectionId}. ${section.title}\nObjective: ${section.objective}`,
[
"Section Instructions:",
"- Write only this section.",
- "- Do not write the full final answer.",
- "- Other sections are being generated in parallel.",
- "- A final ordered answer will be assembled afterwards.",
+ ...(singleSection
+ ? ["- Deliver the complete final answer for this section."]
+ : [
+ "- Do not write the full final answer.",
+ "- Other sections are being generated in parallel.",
+ "- A final ordered answer will be assembled afterwards.",
+ ]),
"- Avoid intro or outro language that assumes the whole answer is already visible.",
"- Use the provided global fragment indexes exactly as shown for citations.",
].join("\n"),
fragmentsSection,
]
.filter(Boolean)
.join("\n\n")
return {
- systemPrompt: buildBaseFinalAnswerSystemPrompt("section"),
+ systemPrompt: buildBaseFinalAnswerSystemPrompt(singleSection ? "final" : "section"),
userMessage,
imageFileNames,
}
}Also applies to: 879-899
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/api/chat/final-answer-synthesis.ts` around lines 757 - 765, The
single-section fallback created by buildDefaultSectionPlan currently uses
section-only instructions that suppress producing a full final answer; change
buildDefaultSectionPlan’s returned section objective/title to explicitly
instruct producing the complete final answer when it’s the only section (e.g.,
“Produce the full final answer using mapped evidence”). Also update the
prompt/template logic that emits the “Do not write the full final answer”
instruction (the code that composes section-level instructions where the string
"Do not write the full final answer" appears) to conditionally skip that
prohibition when the plan contains exactly one section so the fallback yields a
complete answer.
Description
Testing
Additional Notes
Summary by CodeRabbit
New Features
Improvements